266 research outputs found

    HAPPI: an online database of comprehensive human annotated and predicted protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Human protein-protein interaction (PPIs) data are the foundation for understanding molecular signalling networks and the functional roles of biomolecules. Several human PPI databases have become available; however, comparisons of these datasets have suggested limited data coverage and poor data quality. Ongoing collection and integration of human PPIs from different sources, both experimentally and computationally, can enable disease-specific network biology modelling in translational bioinformatics studies.</p> <p>Results</p> <p>We developed a new web-based resource, the Human Annotated and Predicted Protein Interaction (HAPPI) database, located at <url>http://bio.informatics.iupui.edu/HAPPI/</url>. The HAPPI database was created by extracting and integrating publicly available protein interaction databases, including HPRD, BIND, MINT, STRING, and OPHID, using database integration techniques. We designed a unified entity-relationship data model to resolve semantic level differences of diverse concepts involved in PPI data integration. We applied a unified scoring model to give each PPI a measure of its reliability that can place each PPI at one of the five star rank levels from 1 to 5. We assessed the quality of PPIs contained in the new HAPPI database, using evolutionary conserved co-expression pairs called "MetaGene" pairs to measure the extent of MetaGene pair and PPI pair overlaps. While the overall quality of the HAPPI database across all star ranks is comparable to the overall qualities of HPRD or IntNetDB, the subset of the HAPPI database with star ranks between 3 and 5 has a much higher average quality than all other human PPI databases. As of summer 2008, the database contains 142,956 non-redundant, medium to high-confidence level human protein interaction pairs among 10,592 human proteins. The HAPPI database web application also provides …” should be “The HAPPI database web application also provides hyperlinked information of genes, pathways, protein domains, protein structure displays, and sequence feature maps for interactive exploration of PPI data in the database.</p> <p>Conclusion</p> <p>HAPPI is by far the most comprehensive public compilation of human protein interaction information. It enables its users to fully explore PPI data with quality measures and annotated information necessary for emerging network biology studies.</p

    Pathway and network analysis in proteomics

    Get PDF
    Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics

    SLDR: a computational technique to identify novel genetic regulatory relationships

    Get PDF
    We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. In SLDR, we model each candidate target gene as being controlled by N binary-state regulators that lead to ≤2N observable states ("step-levels") for the target. We applied SLDR to the study of the GEO microarray data set GSE25644, which consists of 158 different mutant S. cerevisiae gene expressional profiles. For each target gene t, we first clustered ordered samples into various clusters, each approximating an observable step-level of t to screen out the "de-centric" target. Then, we ordered each gene x as a candidate regulator and aligned t to x for the purpose of examining the step-level correlations between low expression set of x (Ro) and high expression set of x (Rh) from the regulator x to t, by finding max f(t, x): |Ro-Rh| over all candidate × in the genome for each t. We therefore obtained activation and inhibitions events from different combinations of Ro and Rh. Furthermore, we developed criteria for filtering out less-confident regulators, estimated the number of regulators for each target t, and evaluated identified top-ranking regulator-target relationship. Our results can be cross-validated with the Yeast Fitness database. SLDR is also computationally efficient with o(N²) complexity. In summary, we believe SLDR can be applied to the mining of functional genomics big data for future network biology and network medicine applications

    "Super Gene Set" Causal Relationship Discovery from Functional Genomics Data

    Get PDF
    In this article, we present a computational framework to identify "causal relationships" among super gene sets. For "causal relationships," we refer to both stimulatory and inhibitory regulatory relationships, regardless of through direct or indirect mechanisms. For super gene sets, we refer to "pathways, annotated lists, and gene signatures," or PAGs. To identify causal relationships among PAGs, we extend the previous work on identifying PAG-to-PAG regulatory relationships by further requiring them to be significantly enriched with gene-to-gene co-expression pairs across the two PAGs involved. This is achieved by developing a quantitative metric based on PAG-to-PAG Co-expressions (PPC), which we use to infer the likelihood that PAG-to-PAG relationships under examination are causal-either stimulatory or inhibitory. Since true causal relationships are unknown, we approximate the overall performance of inferring causal relationships with the performance of recalling known r-type PAG-to-PAG relationships from causal PAG-to-PAG inference, using a functional genomics benchmark dataset from the GEO database. We report the area-under-curve (AUC) performance for both precision and recall being 0.81. By applying our framework to a myeloid-derived suppressor cells (MDSC) dataset, we further demonstrate that this framework is effective in helping build multi-scale biomolecular systems models with new insights on regulatory and causal links for downstream biological interpretations

    DMAP: a connectivity map database to enable identification of novel drug repositioning candidates

    Get PDF
    BACKGROUND: Drug repositioning is a cost-efficient and time-saving process to drug development compared to traditional techniques. A systematic method to drug repositioning is to identify candidate drug's gene expression profiles on target disease models and determine how similar these profiles are to approved drugs. Databases such as the CMAP have been developed recently to help with systematic drug repositioning. METHODS: To overcome the limitation of connectivity maps on data coverage, we constructed a comprehensive in silico drug-protein connectivity map called DMAP, which contains directed drug-to-protein effects and effect scores. The drug-to-protein effect scores are compiled from all database entries between the drug and protein have been previously observed and provide a confidence measure on the quality of such drug-to-protein effects. RESULTS: In DMAP, we have compiled the direct effects between 24,121 PubChem Compound ID (CID), which were mapped from 289,571 chemical entities recognized from public literature, and 5,196 reviewed Uniprot proteins. DMAP compiles a total of 438,004 chemical-to-protein effect relationships. Compared to CMAP, DMAP shows an increase of 221 folds in the number of chemicals and 1.92 fold in the number of ATC codes. Furthermore, by overlapping DMAP chemicals with the approved drugs with known indications from the TTD database and literature, we obtained 982 drugs and 622 diseases; meanwhile, we only obtained 394 drugs with known indication from CMAP. To validate the feasibility of applying new DMAP for systematic drug repositioning, we compared the performance of DMAP and the well-known CMAP database on two popular computational techniques: drug-drug-similarity-based method with leave-one-out validation and Kolmogorov-Smirnov scoring based method. In drug-drug-similarity-based method, the drug repositioning prediction using DMAP achieved an Area-Under-Curve (AUC) score of 0.82, compared with that using CMAP, AUC = 0.64. For Kolmogorov-Smirnov scoring based method, with DMAP, we were able to retrieve several drug indications which could not be retrieved using CMAP. DMAP data can be queried using the existing C2MAP server or downloaded freely at: http://bio.informatics.iupui.edu/cmaps CONCLUSIONS: Reliable measurements of how drug affect disease-related proteins are critical to ongoing drug development in the genome medicine era. We demonstrated that DMAP can help drug development professionals assess drug-to-protein relationship data and improve chances of success for systematic drug repositioning efforts

    PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology

    Get PDF
    Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug-gene, miRNA-gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/

    Proteomic characterization reveals that MMP-3 correlates with bronchiolitis obliterans syndrome following allogeneic hematopoietic cell and lung transplantation

    Get PDF
    Improved diagnostic methods are needed for bronchiolitis obliterans syndrome (BOS), a serious complication after allogeneic hematopoietic cell transplantation (HCT) and lung transplantation. For proteins candidate discovery, we compared plasma pools from HCT transplantation recipients with: BOS at onset (n=12), pulmonary infection (n=16), chronic graft-versus-host disease without pulmonary involvement (n=15), and no chronic complications post-HCT (n=15). Pools were labeled with different tags [isobaric Tags for Relative and Absolute Quantification (iTRAQ)], and two software tools identified differentially expressed proteins (≥1.5-fold change). Candidate proteins were further selected using a six-step computational biology approach. The diagnostic value of the lead candidate, matrix metalloproteinase-3 (MMP-3), was evaluated by ELISA in plasma of a verification cohort (n=112) with and without BOS following HCT (n=76) or lung transplantation (n=36). MMP-3 plasma concentrations differed significantly between patients with and without BOS (AUC=0.77). Thus, MMP-3 represents a potential non-invasive blood test for diagnosis of BOS

    Aqueous-phase reactive species formed by fine particulate matter from remote forests and polluted urban air

    Get PDF
    In the aqueous phase, fine particulate matter can form reactive species (RS) that influence the aging, properties, and health effects of atmospheric aerosols. In this study, we explore the RS yields of aerosol samples from a remote forest (Hyytiala, Finland) and polluted urban locations (Mainz, Germany; Beijing, China), and we relate the RS yields to different chemical constituents and reaction mechanisms. Ultra-high-resolution mass spectrometry was used to characterize organic aerosol composition, electron paramagnetic resonance (EPR) spectroscopy with a spin-trapping technique was applied to determine the concentrations of (OH)-O-center dot, O-2(center dot-), and carbon-or oxygen-centered organic radicals, and a fluorometric assay was used to quantify H2O2. The aqueous H2O2-forming potential per mass unit of ambient PM2.5 (particle diameter < 2.5 mu m) was roughly the same for all investigated samples, whereas the mass-specific yields of radicals were lower for sampling sites with higher concentrations of PM2.5. The abundances of water-soluble transition metals and aromatics in ambient PM2.5 were positively correlated with the relative fraction of (OH)-O-center dot and negatively correlated with the relative fraction of carbon-centered radicals. In contrast, highly oxygenated organic molecules (HOM) were positively correlated with the relative fraction of carbon-centered radicals and negatively correlated with the relative fraction of (OH)-O-center dot. Moreover, we found that the relative fractions of different types of radicals formed by ambient PM2.5 were comparable to surrogate mixtures comprising transition metal ions, organic hydroperoxide, H2O2, and humic or fulvic acids. The interplay of transition metal ions (e.g., iron and copper ions), highly oxidized organic molecules (e.g., hydroperoxides), and complexing or scavenging agents (e.g., humic or fulvic acids) leads to nonlinear concentration dependencies in aqueous-phase RS production. A strong dependence on chemical composition was also observed for the aqueous-phase radical yields of laboratory-generated secondary organic aerosols (SOA) from precursor mixtures of naphthalene and beta-pinene. Our findings show how the composition of PM2.5 can influence the amount and nature of aqueous-phase RS, which may explain differences in the chemical reactivity and health effects of particulate matter in clean and polluted air.Peer reviewe
    • …
    corecore